Harmonic Mean P-value
   HOME

TheInfoList



OR:

The harmonic mean ''p''-value (HMP) is a statistical technique for addressing the
multiple comparisons problem In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferences ...
that controls the strong-sense family-wise error rate (this claim has been disputed). It improves on the
power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
of
Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Oliv ...
by performing combined tests, i.e. by testing whether ''groups'' of ''p''-values are statistically significant, like
Fisher's method In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combi ...
. However, it avoids the restrictive assumption that the ''p''-values are
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
, unlike Fisher's method. Consequently, it controls the
false positive rate In statistics, when performing multiple comparisons, a false positive ratio (also known as fall-out or false alarm ratio) is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as th ...
when tests are dependent, at the expense of less power (i.e. a higher false negative rate) when tests are independent. Besides providing an alternative to approaches such as
Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Oliv ...
that controls the stringent
family-wise error rate In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests. Familywise and Experimentwise Error Rates Tukey (1953) developed the concept of a ...
, it also provides an alternative to the widely-used Benjamini-Hochberg procedure (BH) for controlling the less-stringent
false discovery rate In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the expe ...
. This is because the power of the HMP to detect significant ''groups'' of hypotheses is greater than the power of BH to detect significant ''individual'' hypotheses. There are two versions of the technique: (i) direct interpretation of the HMP as an approximate ''p''-value and (ii) a procedure for transforming the HMP into an asymptotically exact ''p''-value. The approach provides a multilevel test procedure in which the smallest groups of ''p''-values that are statistically significant may be sought.


Direct interpretation of the harmonic mean ''p''-value

The
weighted harmonic mean In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired. The harmonic mean can be expressed as the recipro ...
of ''p''-values p_1, \dots, p_L is defined as \overset = \frac, where w_1, \dots, w_L are weights that must sum to one, i.e. \sum_^L w_i=1. Equal weights may be chosen, in which case w_i=1/L. In general, interpreting the HMP directly as a ''p''-value is anti-conservative, meaning that the
false positive rate In statistics, when performing multiple comparisons, a false positive ratio (also known as fall-out or false alarm ratio) is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as th ...
is higher than expected. However, as the HMP becomes smaller, under certain assumptions, the discrepancy decreases, so that direct interpretation of significance achieves a false positive rate close to that implied for sufficiently small values (e.g. \overset<0.05). The HMP is never anti-conservative by more than a factor of e\,\log L for small L, or \log L for large L. However, these bounds represent worst case scenarios under arbitrary dependence that are likely to be conservative in practice. Rather than applying these bounds, asymptotically exact ''p''-values can be produced by transforming the HMP.


Asymptotically exact harmonic mean ''p''-value procedure

Generalized central limit theorem shows that an asymptotically exact ''p''-value, p_, can be computed from the HMP, \overset, using the formula p_ = \int_^\infty f_\textrm\left(x\,, \,\log L+0.874,\frac\right) \mathrm x. Subject to the assumptions of generalized central limit theorem, this transformed ''p''-value becomes exact as the number of tests, L, becomes large. The computation uses the
Landau distribution In probability theory, the Landau distribution is a probability distribution named after Lev Landau. Because of the distribution's "fat" tail, the moments of the distribution, like mean or variance, are undefined. The distribution is a particular ...
, whose density function can be writtenf_\textrm(x\,, \,\mu,\sigma) = \frac\int_0^\infty \textrm^\,\sin(2t)\,\textrmt.The test is implemented by the p.hmp command of the harmonicmeanpR package


is available online. Equivalently, one can compare the HMP to a table of critical values (Table 1). The table illustrates that the smaller the false positive rate, and the smaller the number of tests, the closer the critical value is to the false positive rate.


Multiple testing via the multilevel test procedure

If the HMP is significant at some level \alpha for a group of L ''p''-values, one may search all subsets of the L ''p''-values for the smallest significant group, while maintaining the strong-sense family-wise error rate. Formally, this constitutes a closed-testing procedure. When \alpha is small (e.g. \alpha<0.05), the following multilevel test based on direct interpretation of the HMP controls the strong-sense family-wise error rate at level approximately \alpha: # Define the HMP of any subset \mathcal of the L ''p''-values to be \overset_\mathcal = \frac. # Reject the null hypothesis that none of the ''p''-values in subset \mathcal are significant if \overset_\mathcal\leq\alpha\,w_\mathcal, where w_\mathcal=\sum_w_i. (Recall that, by definition, \sum_^L w_i=1.) An asymptotically exact version of the above replaces \overset_\mathcalin step 2 with p_ = \max\left\, where L gives the number of ''p''-values, not just those in subset \mathcal. Since direct interpretation of the HMP is faster, a two-pass procedure may be used to identify subsets of ''p''-values that are likely to be significant using direct interpretation, subject to confirmation using the asymptotically exact formula.


Properties of the HMP

The HMP has a range of properties that arise from generalized central limit theorem. It is: * Robust to positive dependency between the ''p''-values. * Insensitive to the exact number of tests, ''L''. * Robust to the distribution of weights, ''w''. * Most influenced by the smallest ''p''-values. When the HMP is not significant, neither is any subset of the constituent tests. Conversely, when the multilevel test deems a subset of ''p''-values to be significant, the HMP for all the ''p''-values combined is likely to be significant; this is certain when the HMP is interpreted directly. When the goal is to assess the significance of ''individual'' ''p''-values, so that combined tests concerning ''groups'' of ''p''-values are of no interest, the HMP is equivalent to the
Bonferroni Carlo Emilio Bonferroni (28 January 1892 – 18 August 1960) was an Italian mathematician who worked on probability theory. Biography Bonferroni studied piano and conducting in Turin Conservatory and at University of Turin under Giuseppe Peano ...
procedure but subject to the more stringent significance threshold \alpha_L<\alpha (Table 1). The HMP assumes the individual ''p''-values have (not necessarily independent) standard uniform distributions when their null hypotheses are true. Large numbers of underpowered tests can therefore harm the power of the HMP. While the choice of weights is unimportant for the validity of the HMP under the null hypothesis, the weights influence the power of the procedure. Supplementary Methods §5C of and an onlin
tutorial
consider the issue in more detail.


Bayesian interpretations of the HMP

The HMP was conceived by analogy to Bayesian model averaging and can be interpreted as inversely proportional to a model-averaged
Bayes factor The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nu ...
when combining ''p''-values from likelihood ratio tests.


The harmonic mean rule-of-thumb

I. J. Good Irving John Good (9 December 1916 – 5 April 2009)The Times of 16-apr-09, http://www.timesonline.co.uk/tol/comment/obituaries/article6100314.ece was a British mathematician who worked as a cryptologist at Bletchley Park with Alan Turing. Afte ...
reported an empirical relationship between the Bayes factor and the ''p''-value from a likelihood ratio test. For a null hypothesis H_0 nested in a more general alternative hypothesis H_A, he observed that often,\textrm_i\approx \frac,\quad3\frac<\gamma<30, where \textrm_i denotes the Bayes factor in favour of H_A versus H_0. Extrapolating, he proposed a rule of thumb in which the HMP is taken to be inversely proportional to the model-averaged Bayes factor for a collection of L tests with common null hypothesis:\overline=\sum_^L w_i\,\textrm_i \approx \sum_^L \frac = \frac.For Good, his rule-of-thumb supported an interchangeability between
Bayesian Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...
and classical approaches to hypothesis testing.


Bayesian calibration of ''p''-values

If the distributions of the ''p''-values under the alternative hypotheses follow
Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
s with parameters \left(0<\xi_i<1, 1\right), a form considered by Sellke, Bayarri and Berger, then the inverse proportionality between the model-averaged Bayes factor and the HMP can be formalized as\overline=\sum_^L \mu_i\,\textrm_i=\sum_^L \mu_i\,\xi_i\,p_i^\approx\bar\xi\sum_^L w_i\,p_i^=\frac, where *\mu_i is the prior probability of alternative hypothesis i, such that \sum_^L\mu_i=1, *\xi_i/(1+\xi_i) is the expected value of p_i under alternative hypothesis i, *w_i=u_i/\bar\xi is the weight attributed to ''p''-value i, *u_i = \left(\mu_i\,\xi_i\right)^ incorporates the prior model probabilities and powers into the weights, and *\bar\xi = \sum_^L u_i normalizes the weights. The approximation works best for well-powered tests (\xi_i\ll 1).


The harmonic mean ''p''-value as a bound on the Bayes factor

For likelihood ratio tests with exactly two degrees of freedom,
Wilks' theorem In statistics Wilks' theorem offers an asymptotic distribution of the log-likelihood ratio statistic, which can be used to produce confidence intervals for maximum-likelihood estimates or as a test statistic for performing the likelihood-ratio te ...
implies that p_i=1/R_i, where R_i is the maximized likelihood ratio in favour of alternative hypothesis i, and therefore \overset=1/\bar, where \bar is the weighted mean maximized likelihood ratio, using weights w_1,\dots,w_L. Since R_i is an upper bound on the Bayes factor, \textrm_i, then 1/\overset is an upper bound on the model-averaged Bayes factor:\overline\leq\frac.While the equivalence holds only for two degrees of freedom, the relationship between \overset and \bar, and therefore \overline, behaves similarly for other degrees of freedom. Under the assumption that the distributions of the ''p''-values under the alternative hypotheses follow
Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
s with parameters \left(1, \kappa_i>1\right), and that the weights w_i=\mu_i, the HMP provides a tighter upper bound on the model-averaged Bayes factor:\overline\leq \frac,a result that again reproduces the inverse proportionality of Good's empirical relationship.{{cite journal, vauthors=Held, L, date=2019, title=On the Bayesian interpretation of the harmonic mean ''p''-value, journal=Proceedings of the National Academy of Sciences USA, volume=116, issue=13, pages=5855–5856, doi=10.1073/pnas.1900671116, pmc=6442579, pmid=30890644, doi-access=free


References

Multiple comparisons Statistical hypothesis testing